Puntarenas Province
MessIRve: A Large-Scale Spanish Information Retrieval Dataset
Valentini, Francisco, Cotik, Viviana, Furman, Damián, Bercovich, Ivan, Altszyler, Edgar, Pérez, Juan Manuel
Information retrieval (IR) is the task of finding relevant documents in response to a user query. Although Spanish is the second most spoken native language, current IR benchmarks lack Spanish data, hindering the development of information access tools for Spanish speakers. We introduce MessIRve, a large-scale Spanish IR dataset with around 730 thousand queries from Google's autocomplete API and relevant documents sourced from Wikipedia. MessIRve's queries reflect diverse Spanish-speaking regions, unlike other datasets that are translated from English or do not consider dialectal variations. The large size of the dataset allows it to cover a wide variety of topics, unlike smaller datasets. We provide a comprehensive description of the dataset, comparisons with existing datasets, and baseline evaluations of prominent IR models. Our contributions aim to advance Spanish IR research and improve information access for Spanish speakers.
- North America > United States > California > Santa Barbara County > Santa Barbara (0.14)
- North America > Mexico (0.04)
- South America > Colombia > Bogotá D.C. > Bogotá (0.04)
- (34 more...)
Tech billionaire on journey to immortality says there is a 'low probability' humans will survive without AI
Johnson spends millions every year in order to find a way to make his organs similar to that of an 18-year-old male. A tech billionaire on a quest to reverse the aging process believes that it is unlikely humanity will survive without the assistance of artificial intelligence (AI). Bryan Johnson, a 46-year-old tech entrepreneur, spends millions yearly on a team of experts monitoring his health and conducting experiments. The goal: Get his organs to look and act like that of an 18-year-old. Some of his regiments include a strict bedtime of 8:30 p.m., taking 111 pills daily, collecting his stool samples, and having a small device attached to his penis to monitor nighttime erections.
- North America > United States > New York (0.05)
- North America > United States > California > Yuba County > Linda (0.05)
- North America > United States > California > San Bernardino County > Loma Linda (0.05)
- (5 more...)
Evaluating Self-Supervised Speech Representations for Indigenous American Languages
Chen, Chih-Chen, Chen, William, Zevallos, Rodolfo, Ortega, John E.
The application of self-supervision to speech representation learning has garnered significant interest in recent years, due to its scalability to large amounts of unlabeled data. However, much progress, both in terms of pre-training and downstream evaluation, has remained concentrated in monolingual models that only consider English. Few models consider other languages, and even fewer consider indigenous ones. In our submission to the New Language Track of the ASRU 2023 ML-SUPERB Challenge, we present an ASR corpus for Quechua, an indigenous South American Language. We benchmark the efficacy of large SSL models on Quechua, along with 6 other indigenous languages such as Guarani and Bribri, on low-resource ASR. Our results show surprisingly strong performance by state-of-the-art SSL models, showing the potential generalizability of large-scale models to real-world data.
- South America > Brazil (0.05)
- North America > Canada > Ontario > Toronto (0.05)
- South America > Bolivia (0.05)
- (18 more...)
turning-the-tide-with-ai-and-hpc
With the country's unique position within the Ring of Fire, such natural hazards have become part and parcel of everyday life in Japan. Accordingly, the nation is considered a model for disaster preparedness: each resident is advised to carry fireproof evacuation bags with first aid, sanitation products as well as food and water. Meanwhile, buildings constructed after 1981 are required to have earthquake-resistant structures, meaning thicker beams, pillars and walls as well as shock-absorbers to reduce shaking in taller buildings. And yet, the 2011 Great East Japan Earthquake came as a huge shock--literally. On March 11, 2011, the Tohoku region along Japan's eastern coast was rocked by a magnitude 9.0 earthquake for six minutes; the strongest in the country's records so far.
- Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.09)
- North America > Costa Rica > Puntarenas Province (0.07)
- North America > Costa Rica > Guanacaste Province (0.07)
SynthBio: A Case Study in Human-AI Collaborative Curation of Text Datasets
Yuan, Ann, Ippolito, Daphne, Nikolaev, Vitaly, Callison-Burch, Chris, Coenen, Andy, Gehrmann, Sebastian
NLP researchers need more, higher-quality text datasets. Human-labeled datasets are expensive to collect, while datasets collected via automatic retrieval from the web such as WikiBio are noisy and can include undesired biases. Moreover, data sourced from the web is often included in datasets used to pretrain models, leading to inadvertent cross-contamination of training and test sets. In this work we introduce a novel method for efficient dataset curation: we use a large language model to provide seed generations to human raters, thereby changing dataset authoring from a writing task to an editing task. We use our method to curate SynthBio - a new evaluation set for WikiBio - composed of structured attribute lists describing fictional individuals, mapped to natural language biographies. We show that our dataset of fictional biographies is less noisy than WikiBio, and also more balanced with respect to gender and nationality.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.28)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
- (35 more...)
- Leisure & Entertainment > Sports (1.00)
- Health & Medicine > Therapeutic Area (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
Climate-driven statistical models as effective predictors of local dengue incidence in Costa Rica: A Generalized Additive Model and Random Forest approach
Vásquez, Paola, Loría, Antonio, Sanchez, Fabio, Barboza, Luis A.
Climate has been an important factor in shaping the distribution and incidence of dengue cases in tropical and subtropical countries. In Costa Rica, a tropical country with distinctive micro-climates, dengue has been endemic since its introduction in 1993, inflicting substantial economic, social, and public health repercussions. Using the number of dengue reported cases and climate data from 2007-2017, we fitted a prediction model applying a Generalized Additive Model (GAM) and Random Forest (RF) approach, which allowed us to retrospectively predict dengue occurrence in five climatological diverse municipalities around the country.
- Asia > Philippines > Luzon > National Capital Region > City of Manila (0.14)
- Asia > China > Guangdong Province (0.14)
- Africa > Liberia (0.06)
- (16 more...)